Volatility: a hidden Markov process in financial time series 
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04 ■ The volatility characterizes the amplitude of price return fluctuations. It is a central magnitude in 

' finance closely related to the risk of holding a certain asset. Despite its popularity on trading floors, 

the volatility is unobservable and only the price is known. Diffusion theory has many common points 
with the research on volatility, the key of the analogy being that volatility is the time-dependent 
£ — ■ diffusion coefficient of the random walk for the price return. We present a formal procedure to extract 

£Nj ' volatility from price data, by assuming that it is described by a hidden Markov process which together 

with the price form a two-dimensional diffusion process. We derive a maximum likelihood estimate 
of the volatility path valid for a wide class of two-dimensional diffusion processes. The choice of 
the exponential Ornstein-Uhlenbeck (expOU) stochastic volatility model performs remarkably well 
in inferring the hidden state of volatility. The formalism is applied to the Dow Jones index. The 
Q ■ main results are: (i) the distribution of estimated volatility is lognormal, which is consistent with 

O ' the expOU model and (ii) the estimated volatility is related to trading volume by a power law of 

& \ the form a tx y ' 55 ; and (iii) future returns are proportional to the current volatility which suggests 

C/j , some degree of predictability for the size of future returns. 

o ■ 

; PACS numbers: 89.65.Gh, 02.50.Ey, 05.40. Jc, 05.45.Tp 
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m ! I. INTRODUCTION 

> ■ 

00 ■ The volatility measures the amplitude of return fluctuations and it is one of the central quantities in finance [l[ . 

Investors sometimes place even greater emphasis on the level of volatility than on the market trend itself. The reason 
C\| ■ for this is mainly that the risk of holding an asset is classically associated with its volatility Q. The theoretical 
framework used to quantify aspects of price fluctuations has many common points with areas of physics dealing with 
noisy signals. The research of random diffusion aims to describe the dynamics of particles in random media and its 
methods have been applied to a large variety of phenomena in statistical physics and condensed matter Q . Time series 
& describing solar flares, earthquakes, the human heartbeat or climate records show strong correlations, multi-scaling, 
^ ■ non-Gaussian statistics and self-organized behavior [IJMS, 0] • These are properties also observed in financial time 
series where volatility is considered to play a key role |8[ . 

The picture that prices follow a simple diffusion process was first proposed by Bachelier in 1900 Q. Later in 1959, 
the physicist Osborne introduced the geometric Brownian motion and suggested that volatility can be viewed as 
the diffusion coefficient of this random walk (Io| . The simplest possible assumption - that it is a time-independent 
. ^ ' constant - lies at the heart of classical models such as the Black-Scholes option pricing formula [ll[ ■ More recently it 
has become widely accepted that such an assumption is inadequate to explain the richness of the markets' behavior 
5^ ' [l|. Instead, volatility itself should be treated as a random quantity with its own particular dynamics. 

Among its most relevant properties [l], [l2|, [IH, [ld[ , volatility is the responsible for the observed clustering in price 
changes. That is: large fluctuations are commonly followed by other large fluctuations and similarly for small changes 
P, i|. Another related feature is that, in clear contrast with price changes which show negligible autocorrelations, 
volatility autocorrelation is still significant for time lags longer than a year [H, [l2|, EH ■ Most of these studies introduce 
a subordinated process which is associated with the volatility in one way or another [lj| El, [l?], EH EH US HH HH US 
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The main obstacle of the appropriate analysis of volatility is that it is directly unobservable. As we have mentioned, 
volatility provides important information to traders but it is very unclear how reliable the estimates of such a hidden 
process can be. Investors use several proxies to infer the level of current asset volatility. The most common ways are: 
(i) to make it equivalent to the absolute value of return changes, (ii) to assume a proportional law between volatility 
and market volume prl I27L l28l . l29l | , and (iii) to use the information contained in option prices and obtain the so-called 
"implied volatility" which, in fact, corresponds to the market's belief of volatility 

This already raises the questions: What process is a proper model of volatility and how to adjust the possible 
parameters to describe various stocks and markets? Among the possible candidates, multifractals [2, E3, El, El, l20j 
and stochastic volatility models [TH, [2l|, [22|, [23|, [24|, HH, HH are the most promising. The fact that volatility is directly 
unobservable makes even more difficult to get a conclusive answer about the best model. This doubled challenge has 
deserved the attention from the most diverse disciplines that look at financial markets. All of them have converged on 
the description of what is technically known as realized volatility and on methodologies for reconstructing volatility 
path. Several procedures for reconstructing volatility have already been presented being more or less dependent on 
the volatility model chosen. Mathematics, econometrics, and finance research have a large number of publications 
during the last decade on this issue (see e.g. [H, H3, SH El, |H, H, [H, H, [H, [H, [H, \M, EH ) . 

Our research presents an alternative procedure to estimate volatility from the price dynamics only. Likewise most of 
the papers in the literature, the price dynamics is represented by a two-dimensional diffusion process: one dimension 
for price and second dimension for a volatility described by a hidden Markov process. Our case estimates the volatility 
subordinated time series through maximum likelihood but, in contrast with other studies, having fixed in advance the 
parameters of the model. 

We have decided to focus on a particular stochastic volatility model that is able to circumvent both mathematical and 
computational difficulties: the exponential Ornstein-Uhlenbeck volatility model [2l|, HI] although the same procedure 
can be used for a larger class of models. As its name indicates, the model assumes that the logarithm of the volatility 
follows an Ornstein-Uhlenbeck process, that is: a mean reverting process with linear drift. The resulting model is 
capable of reproducing the statistical properties of the financial markets fairly well [2l| . 

The paper is organized as follows. In Sec. |TT] we outline the general stochastic volatility framework and more 
specifically the exponential Ornstein-Uhlenbeck model. In Sec. IIIII we present a maximum likelihood estimator for 
a wide class of stochastic volatility models. For the case of the exponential Ornstein-Uhlenbeck (expOU) model we 
present Monte Carlo simulations to show that it performs remarkably well in inferring the hidden state of the volatility 
process. In Sec. IIVI the procedure is applied to the Dow Jones Industrial Average. Conclusions are drawn in Sec. [V] 
and some more technical details are left to the appendices. 



II. STOCHASTIC VOLATILITY MODELS 

The geometric Brownian motion (GBM) [l(| is the most widely used model in finance. In this setting the asset 
price S(t) is described through the following Langcvin equation (in Ito sense): 

^=Vdt + *dW 1 (t), (1) 

where a is the volatility, assumed to be constant, (i is some deterministic drift indicating an eventual trend in the 
market, and W\(t) is the Wiener process. We define the zero-mean return X(t) as 

X(t) = In [S(t + t Q )/S(t )} - (In [S(t + t )/S(t Q )}) , (2) 

where the symbol (• • • ) designates the average value and t is the initial time which is usually set to be zero. In terms 
of X(t) the GBM is simply written as 

dX(t) =adWi(t). (3) 

However, especially after the 1987 market crash, compelling empirical evidence has become available that the 
assumption of a constant volatility is doubtful @ . Neither is volatility a deterministic function of time as one might 
expect on account of the non-stationarity of financial data, but a random quantity [22l | . 

In the most general setting one therefore assumes that the volatility er is a given function of a random process Y(t): 



a(t) = f[Y(t)). 



(4) 
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Most stochastic volatility (SV) models assume Y(t) is also a diffusion process that may or may not be correlated with 
price. The main difference between various models is only the parametrization of this scheme. In a general notation 
the zero- mean return X(t) defined above is described by the following set of stochastic differential equations 

dX(t) = flY^dWxit) (5) 
dY(t) = -g[Y(t)]dt + h[Y(t)]dW 2 (t), (6) 

where dX = dS/S — (dS/S) and /, g and h are given functions of Y(t). As shown in Eq. f[Y(t)] corresponds to 
the volatility, i.e., the amplitude of return fluctuations. However, since f(x) is usually chosen to be a monotonically 
increasing function, it is not misleading to think of Y as a measure of volatility. Thus, as far as there is no confusion, we 
will refer to the process Y(t) as "volatility" as well. On the other hand, the function g[Y(i)] describes a reverting force 
that drives the volatility toward the so-called "normal level". This force brings the volatility process to a stationary 
regime for long time horizons and the normal level is related to the average volatility in that limit. Finally, the 
subordinated process Y(t) may have a non-constant diffusion coefficient defined in terms of the function h[Y(t)} which 
is called the volatility-of- volatility ("vol of vol"). The functions g and h fully describe the volatility process. The 
resulting dynamics is comparable with the one described by a Gaussian particle trapped in a potential well V(y) 
whose associated force is —g{y), where g(y) = V'(y). In finance one typically proposes convex potentials with only 
one minimum whose value is related to the normal level of the volatility. 

In what follows we will mostly work with one particular SV model, the exponential Ornstcin-Uhlcnbcck (cxpOU) 
model, which follows from the substitutions: 

f(x) = me 1 , g(x) = ax, h(x) = k, 

that is, 

dX(t) = me Y ^dW 1 (t), (7) 

dY(t) = -aY(t)dt + kdW 2 (t). (8) 

Note that in this model the process Y(i) is precisely the logarithm of the volatility, or "log- volatility" for short. The 
main statistical properties of the model are thoroughly discussed in Ref. [2l[ . We simply recall that the stationary 
distribution of the process Y(t) is a Gaussian (i.e., a lognormal distribution for a) with zero mean and variance (3: 

p(y) = -^=cxp(-y 2 /2(3), (9) 

y Z7T/3 

where 

(3=k 2 /2a. (10) 



III. VOLATILITY ESTIMATION 



A. The Wiener measure and volatility estimation 

Let X and Y denote a simultaneous realization of the variables X(t) and Y(t) in the time interval t — s < r < 
t. Omitting all Y-indepcndent terms, we show in Appendix [A] that the probability density (likelihood) of such a 
realization is approximately given by 



InP(X,Y)~~ f 

^ J t—s 

Before proceeding further, we will discuss the meaning of this expression. We first note that Eq. pd|) has to be 
understood in the sense of generalized functions [42j since the Wiener process is only differentiable in this sense and, 
hence, X(t) and Y(t) do not exist as ordinary functions and Eq. is just a symbolic expression. Nevertheless, the 
formula is still valid when the integral and the derivatives therein are discretized with arbitrary small time steps, a 
requirement that is indeed necessary for numerical computations. 

Let us now see some qualitative properties of Eq. pip. The first summand measures the fluctuations of the 
zero-mean return with respect to the volatility, [X (t) / f (Y (t))] 2 , and their contribution to the likelihood (probability 



Mr) 
f[Y(r)] 



dr-- 
2 



Y + 9[Y(t)\ 
h[Y(r)] 



dr 



(11) 
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density) of a given return realization. Note that the higher this contribution is, the lower those "relative" fluctuations 
are. In the same fashion, the second summand in Eq. pip measures the fluctuations of the volatility process Y(t) 
with respect to the vol of vol h(Y), although in this case these fluctuations are gauged with the mean reverting force 
—g(Y). As before, the lower these fluctuations, the higher their contribution to the log-likelihood (TTT|) . 

While Eqs. (J5j)- (J6]) represent a joint model for return and volatility, the stock market data only include recordings 
of the return process X . The Y process and, hence, the volatility f(Y), must be inferred indirectly in a Bayesian 
fashion through Eq. (fTTj) . Indeed, the conditional probability density that the realization of the hidden Y"-process is 
Y, given that the observed return is X, reads 

lnP(Y|X) = lnP(X,Y) -lnP(X). (12) 

In consequence, we can find the maximum likelihood sample path of the (hidden ) volatility process by maximizing Eq. 
ilty) with respect to Y (recall that Y is a realization of Y(r) in the interval t — s < r < t). Since the second summand 
of Eq. (|12p is independent of Y, we can neglect P(X) in this maximization process. Therefore, the maximization of 
lnP(Y|X) yields the same result as that of lnP(X, Y). 

Note that, besides the specification of the stochastic volatility model (that is, the explicit forms of /, g, and h), the 
only free parameter is s: the duration of past return data to take into account. After substituting the observed return 
history as X, we will obtain by maximum likelihood the quantity: 

Y = argmax Y lnP(Y|X) = argmax Y lnP(X, Y). (13) 

We should mention that similar maximization problems have been studied in the context of hidden Markov models, 
where this procedure is called "decoding". When the state space (the number of possible X and Y values) is finite, the 
optimization can be done exactly by the Viterbi algorithm [431 ] , while there has been limited success in the continuous 
case [441 ] . A similar technique has been applied to the forecasting of volatility assuming a binomial cascade model which, 
unlike stochastic volatility models, has a finite state space (see Ref. [13] )■ More focused on the stochastic volatility, 
we should also mention the efforts based on Kalman or Particle filtering, Bayesian inference, conditional likelihood or 
Fourier methods among other similar techniques (see e.g. Refs. [H, H3, H3, [H, [H, [H, [35|, [H, H3, [H, [H, 0, 5l[ ) . Most 
of the cited works assume an specific model, although they never focuss on the expOU model. In addition, they are 
mainly worried about intraday (high-frequency) data while we are here focussed on reproducing daily (low-frequency) 
data. It deserves an special attention the work on a superposition of Levy Ornstein-Uhlenbeck processes for a 2 crafted 
to describe high-frequency (intraday) data and provide a power law slow decay for the volatility autocorrelation [T5II32I]. 
All the techniques mentioned above provide an efficient way to reproduce the volatility path but, in contrast to our 
case, the method also serves to estimate the parameters of the model. We have decided to provide the parameters 
beforehand thus using an independent way of estimating them (see Sec. IIV A| . It should be left for a future research 
an accurate comparison between our method and others. We may look for a way to implement parameter estimation 
in our method. 

As we have already stated, our main objective is to design a method able to filter the Wiener noise dW\(t) out of 
Eq. and thus to obtain a reliable estimate Y(t) of the hidden volatility process Y(t). The method, an extension 
of the deconvolution procedure previously presented in plj . has basically the following five steps: 

(i) We simulate a random sample path of the Wiener process dWi(r) for t — s < r < t. 

(ii) Then a surrogate realization of Y is generated as 



Y(r) = r 1 



dX(r) 



dWx{T) 



(14) 



where t — s < r < t. Note that this equation requires that f(x) is invertible which implies that f(x) be chosen 
to be a monotonic function. 

(iii) Substitute Y s and X into Eq. (fTTj) to calculate the log-likelihood of this realization. 

(iv) Iterate (i)-(iii) for / steps, keep the highest likelihood random realization (the conditional median) and assume 
this to be the proper estimate Y(t). 

(v) The estimate of the hidden process at time t is then Y(t). The estimate of the volatility is given by 



a(t) = f[Y(t)]. 



(15) 
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B. Interpretation of the method 



Let us further elaborate the meaning of such an estimate. In finance, the volatility is often identified with the 
absolute value of returns variations. Indeed, as a first approximation, we can replace in Eq. ^ the noise term by its 
expected value and write 

r ,. \dX(t)\ 

which shows that the volatility is approximately proportional to the absolute returns. Eq. ([5]) can be thus thought of 
as a first approximation toward estimating volatility. Our method, based on the maximization of Eq. (|12|) . takes this 
estimation two steps further. In effect, the first step was taken in Ref. [2l[ where we replaced the average (|G?Wi(i)|) 
by a simulated sample path. We are now taking a second and more refined step in which we are not only replacing 
the Wiener noise by a random simulation but, in addition, we perform the maximum likelihood method described by 
items (i)-(v). 

Thus we arc basically separating the observed returns dX(t) into two sources: a(t) and dW\{t). To do this, we 
have first considered a specific form of stochastic volatility. Secondly, we have taken the driving Wiener noises dW\ (t) 
and dW2(t) appearing in Eqs. ((5]) and ([6]) to be uncorrelated. Finally, we have assumed that a(t) is approximately 
constant over the time step during which we numerically evaluate the derivatives Y(t) and X(t) appearing in Eq. 
(fTTj) . We incidentally note that if h(x) = - the vol of vol is equal to zero - then the stationary solution of Eq. ^ is 
Y(t) = 0. Thus the model reduces to the Wiener process in which the volatility is constant and absolute returns are 
uncorrelated @. 



C. The performance of the estimator 

In order to test the performance of the estimator, we simulate the expOU process by using Eqs. 0-© with the 
realistic parameters obtained in Section HVl The relationship between the simulated value of the log- volatility Y(t) 
and its estimate Y{t) is given in Fig. [1] The two quantities agree within error bars, so we may state that 

Y{t) « Y(t) (in mean square sense). 

In what follows we will always use s = 10 days of past data and I = 10 5 iterations for maximization. The time step 
for discretization will be At = 1 day. For random optimization, the necessary number of iterations / grows very fastly, 
perhaps exponentially, with s/ At. The values of s and At cited here were chosen to keep the task computationally 
feasible. As for the value of /, in this case an increase to I = 10 6 does not improve the estimates noticeably. The 
estimates generated with this parameter set have negligible bias and they can efficiently distinguish between low and 
high volatility periods (see Appendix [B] for a discussion on the robustness and possible bias of the procedure) . 

An additional verification of the good performance of our estimate is shown in Fig. [2J where we give the actual 
sample path of Y for a single realization of the expOU model together with the estimated Y. As we can see the 
estimate follows the true log- volatility Y(t) very closely. 



D. Volatility forecasting 

Another possible approach to the utility of the expOU model is to evaluate its forecasting performance. One can 
for example use the mean error of absolute returns \dX(t + h)\ for a given Monte Carlo simulated path: 

E(h) = (j\dX(t + h)\ - \dX(t + h)\fj, (17) 

where \dX(t + h)\ is our estimate for the absolute return \dX(t + h)\ given an estimate Y(t) while h is the time 
horizon (number of days) to forecast ahead. We recall that for our estimates we take the conditional median of our 
MonteCarlo simulations. Thus for the expOU model we have 

\dX(t + h)\ =M[\dWi(t + h)\]mexp[Y(t)exp(-ah)], (18) 

where M[|(iVFi(f + /i)|] denotes the median of the absolute value of Wiener increment. Figure [3] compares five methods 
to forecast \dX(t + h)\ using only information available before time t. For clarity we also give the titles used in Fig. 
1 in brackets ("..."): 
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Figure 1: Estimated log- volatility Y as a function of the actual log- volatility Y taken from 2 x 10 5 simulations of the expOU 
model. Reconstruction used the last s — 10 values of returns, and I = 10 5 iterations. The error bars represent the 25% and 
75% quantiles of the distribution estimated volatility. 




Figure 2: Estimated and actual volatility for a typical sample path of the expOU model. The estimated values were smoothed 
by 5-neighbor averaging to reduce noise. 



1. The median of the last 5 absolute returns. This method clearly has predictive power for h < 150 days. ("5-day 
abs. ret.") 

2. The median of the last 15 absolute returns. The longer averaging period brings a substantial improvement for 
short-term forecasts, but not for long-term. ("15-day abs. ret.") 

3. Eq. (fl~8|) with Y(t) = Y(t), which is the true value in the underlying simulation of the expOU model. This gives 
a substantial decrease in forecasting error, which persists up to h ~ 500 days. This curve is also a theoretical 
lower bound for error achievable by the expOU model in any time series: Here the underlying data are perfectly 
described by expOU, and the parameters and Y(t) are known "perfectly". Neither of these usually happen in 
real data, and so there one expects worse performance, ("perfect") 

4. Eq. ([T8|) with Y(t) estimated by the reconstruction procedure of Sec. IIIIl This estimator does not perform well, 
due to noise in the optimization procedure. ("1-day forecast") 

5. Eq. (fT8|) with Y(t) estimated by the reconstruction procedure of Sec. IIIIl then averaged for the last 5 days. 
Such averaging greatly decreases noise, and the accuracy of the forecast is improved for short times (cf. Fig. [2]). 
("5-day forecast") 
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Figure 3: Median forecasting error E(h) of absolute returns calculated according to Eq. (|18fl by five methods. The errors were 
normalized by the lowest level of error achievable with the assumption of a time-independent constant volatility (horizontal 
dashed line). The result was averaged over 150000 independent realizations of the expOU process, parameters were chosen as 
outlined in Sec. IIV Al 

Finally, note that for h ~ 500 days all estimators based on the expOU model lose any information included in Y(t), 
and converge to the same error level. This is consistent with real data (cf. Sec. IIV Dp . 

IV. APPLICATION TO STOCK MARKET DATA 

In this section we present an application of the method to actual stock market data. We analyze the Dow Jones 
Industrial Average (DJIA) index in the period 01 /01/1900-05/26/2006, a total of 29, 038 days. In order to work with 
zero-mean returns, the mean return was subtracted from the actual data. Trading volumes for the index are only 
available for the period 04/01/1993 - 05/26/2006, a total of 3, 375 days. 



Parameter estimation 



We recall that in the estimation procedure presented here one necessarily needs to assume a theoretical model for 
the volatility. Having done this, the next step is to estimate the parameters involved in the model chosen. For the 
expOU model, Eqs. (O-©, these parameters are: to, k and a. 

Before proceeding further we remark that time increments in real data have a finite size since the market always 
works on discrete times (for daily data the minimum time increment is 1 day). Thus, in practice, the (infinitesimal) 
return variation dX (t) = X(t + dt) — X(t) corresponds to a (finite) return increment A A" (t) = X (t + At) — X (t ) where 
At is the time step between two consecutive ticks. Also the Wiener differentials dW(t) correspond, in mean square 
sense [45[, to the increments 

(19) 



AW(t) ps e(f)VAt, 



where e(t) is a Gaussian process with zero mean and unit variance [45j |. In the present case our time step has a fixed 
width and is equal to At = 1 day. 

Coming back to the estimation of parameters, we show in Appendix [Q that 

lnm ps ( 7 + ln2)/2+ (Jin (j AA|/VaT) ^ , (20) 

where 7 = 0.5772 • • • is the Euler constant. Taking into account that the third summand can be evaluated from data 
we see that Eq. (f2"0| provides a direct estimation of m. 

On the other hand, if the expOU model is appropriate then the empirical estimate Y(t) of the hidden volatility 
Y(t) should also be a Gaussian process with a stationary distribution of zero mean and variance given by j3 = k 2 /2a 
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[see Eq. (|10[) ]. As shown in Fig. [5J if one takes (3 = 0.61 ± 0.05, the distribution of Y{t) is Gaussian and coincides 
with the theoretical distribution of Y(t) given by Eq. @. The assumption of a Gaussian distribution for our estimate 
is robust and it holds for a wide range of parameters. 

To fully specify the model we have to obtain the parameter a. We have chosen the value found in Ref. [2l[ which 
was obtained in order to capture the long range correlations of the volatility, at least up to 500 days. The model is 
able to provide the appropriate long range behavior with an infinite sum of exponentials but it is also true that it 
does not provide a pure power law decay like the models in Refs. [l5lll7||. In any case, we have seen in Ref. [2l[ that 
at least for daily data our approach is satisfactory. Our final set of parameter estimates is thus 

m = (7.5 ± 0.5) x 10~ 3 days~ 1/2 , 



a = (1.82 ± 0.03) x 10 -3 days" 1 , 

and 

k = (4.7 ± 0.3) x 10~ 2 days" 1/2 . 

The errors were determined based on Fig. [4j similarly to the error of f3. In this parameter range the distributions of 
the estimated Y(t) and simulated Y(t) series agree well. The results reported throughout this paper are insensitive 
to the misspecification of these parameters, even beyond these error bars (see also Appendix |B|) . 

The distributions of log- volatility are compared in Fig. 0]for four cases: our maximum likelihood procedure applied 
to Dow Jones, a simulation of cxpOU, the simple estimate of Eq. (|21[) and the deconvolution procedure introduced 
in Ref. \2M which can be written as 



Y d ccon(t) = In — 

171 



dX{t) 



dW x {t) 



(21) 



where Wi(t) is a simulation of the Wiener process. Note that this deconvoluted log-return estimator has indeed a 
Gaussian distribution but with a larger variance as hinted in Fig. 2]in view of its wider density. We finally mention that 
the estimate for the log-volatility \n(a/m) sa In \dX\ given by Eq. (flU]) shows a non Gaussian and biased distribution 
as was also reported in Ref. [2l[. This suggests that l^decon is an appropriate "null model" to contrast with Y. Both 
quantities are generated by dividing dX(t) by the increments of a realization of the Wiener process and then taking 
the logarithm of the absolute value of this ratio. The difference lays in the fact that Y takes the realizations that 
satisfy a maximum likelihood requirement while Idecon takes a Wiener process realization that is purely random. In 
such a way, our method keeps the divisor correlated with dX and, as we will see next, it seems to conserve clustering 
and memory effects in the log-volatility time series. 



B. Clustering and the estimated volatility 

To support our claim that the technique presented is powerful enough in filtering the noise dW\ out of returns we 
will give a visual comparison based on the following qualitative experiments. 

Figure [5] [Left] displays a comparison over a 1000-day time interval. One can observe there that the noise level in 
Y is substantially smaller than in Ydccon, as also inferred from Fig. |U 

In order to show that such a correlation is responsible for suppressing large fluctuations in the ratio, we can perform 
a second experiment. Thus in Fig. O [Right] we see a comparison between the logarithm of absolute returns variations, 
In \dX\, and the estimated volatility Y, Note that the proper clustering of volatility becomes clearly visible. 



C. A comparison with trading volume 

The hidden nature of the volatility process has been addressed by several authors [2(1, [2?], For instance, Ref. 
[2(| suggests that, instead of the volatility, a good estimate would be the square root of the daily trading volume. 
That is, 

M[a(t)} cx V{t) a , (22) 

where a = 0.5 and M[-] denotes the median. In Fig. [5] we show evidence that supports this assumption. Again, 
the first estimation for the volatility is Therefore, we regress ln|c?X(t)| versus \nV(t) as shown in Fig. [5] 
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Figure 4: A comparison of estimates of log- volatility for Dow Jones. Black boxes ("estimated Y, D JIA") represent our maximum 
likelihood method applied to empirical data. Empty circles ("simulated Y, expOU") represent the distribution of the simulated 
sample path of the log- volatility, assuming that Y(t) follows the expOU model. We also plot the empirical distributions of two 
estimates: the red line ("In [return], DJIA") was obtained through Eq. (|16[l and the black line ("In [deconv. ret], DJIA") via 
Eq. dHJ. 




Figure 5: [Left] A comparison between the estimate Y(t) and Y& ecorl (t) for a typical 1000-day period of Dow Jones. These curves 
were not smoothed in order to show the substantial reduction of both the noise level and the asymmetry in Y(t) compared to 
^decon(i) with a random approximation of the noise term dW. [Right] The estimate Y(t) and the logarithm of absolute return 
variations for the whole sample of Dow Jones. 



In the same figure we also present the regression between the maximum likelihood estimate Y(t) and hiV(t) which 
appears to be less noisy than the former regression in accordance with the smaller variance of Y(t) compared to that 
of ln|dAT(i)|. Nevertheless, the exponent a « 0.55 is the same for both regressions. There have been similar [Z(|, 
albeit controversial [13, [H|, findings for the price impact of single transactions. However, Eq. does not yet imply 
that volatility is proportional to volume, only that its typical value is (i.e., the median). Fluctuations around the 
average behavior due to changes in liquidity might have a key role in the process (48j . 
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Figure 6: Logarithm of daily absolute return and estimated log-volatility Y as the function of daily volume. Days with similar 
volumes were binned for better visibility. The symbols represent the medians, and the error bars the 25 — 75% quantiles in the 
bins. 



D. The predictive power of volatility 
From Eqs. ([T])-©, we know that for the expOU model simple relationship can be given between ln|dX(t)| and 

Y(ty. 

\n\dX(t)\ =ln(m\dW 1 (t)\)+Y(t). 
Therefore, the conditional median of In |gLY(£)| given Y(t) is 



M 



In|dX(t)| 



Y(t) 



const. + Y(t). 



(23) 



We point out that this relationship implies some degree of predictability of the absolute changes in return through 
their median, if one knows the current value of the log- volatility Y(t). We test Eq. (|23[) for real data and with Y(t) 
replaced by its estimate Y(t). As shown by the bottom curve of Fig. [7l the slope of the linear regression between 
M[ln |F] and Y(t) is not equal to 1 - as would have been implied by Eq. (f2"3"| - but 0.9 which still suggests 

strong predictive power. 

Recall that the minimum time step of the empirical data used is 1 day. Hence, Eq. (|23p implies the prediction of 
tomorrow's return based on today's volatility and return. Wc now want to extend the prediction horizon. To this end 
we generalize Eq. (|23[) and propose the following ansatz: 



M 



]n\dX(t + h)\ 



Y(t) 



const. +^(h)Y(t), 



(24) 



where h = 0, 1, 2, ■ • • . In Fig. [7] we test this ansatz for several values of the horizon: h = 0,5, 20, 100 and 1000 days. 
We find that the slope "f{h) is a decreasing function from the value 7(0) = 0.9 to practically zero when h = 1000 days 
which means a complete loss of memory. Note that, when h = 100 trading days we have 7 = 0.25 still implying a 
slight degree of prediction after approximately five months, which is of the same order of magnitude than the DJIA 
characteristic time scale, 1/a ~ 500 days, for the relaxation of the volatility plj . 



V. CONCLUSIONS 



The volatility is a crucial quantity for financial markets since it provides a measure of the amplitude of price 
fluctuations. Traders try to follow carefully the level of volatility because it gives the perception of the risk associated 
with any asset. Although volatility was originally conceived to be the diffusion coefficient of the price return random 
walk, there is compelling evidence not to consider it a constant, but a subordinated random process. The framework is 
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Figure 7: The proportionality between the estimated volatility and of the logarithm of absolute return variations. In order to 
decrease noise, 5-day moving averages have been used. Numbers on the right indicate the slopes of the corresponding regression 
lines. Time shifts from bottom to top: h = days (■), 5 days (•), 20 days (a), 100 days (▼), 1000 days (♦). Days with 
similar absolute returns were binned for better visibility. The symbols represent the medians, and the error bars the 25 — 75% 
quantiles in the bins. 

analogous to that of random diffusion processes which have been applied to a large variety of phenomena in statistical 
mechanics and condensed matter physics. 

The main obstacle to get a better knowledge of the volatility's nature is that it is not directly observed. In fact, this 
is precisely the motivation behind the present research. Our main objective has been to develop a tool which visualizes 
the sample path of volatility. The procedure derives a maximum likelihood estimate assuming that the volatility is 
a hidden Markov process. To do so, one needs also to assume a specific model for the volatility. We have chosen 
a class of two dimensional diffusions commonly known as stochastic volatility models, where the volatility acts as a 
diffusion particle trapped in a potential well. We have focused on the expOU model and obtained promising results, 
especially for three reasons: (i) the model is computationally feasible; (ii) its parameters can be easily obtained and 
fit the data reasonably well; and (iii) the distribution of the estimated volatility is log-normal, which is consistent 
with the assumed expOU model. 

We have shown for the Dow Jones index daily data that the sample path of our estimated volatility improves other 
estimates. We have compared our estimation with a rather typical one which identifies volatility with absolute return 
changes. Our estimation is able to remove the existing bias in the stationary distribution of volatility while still 
preserving the clustering in volatility time scries. We have also studied the estimate of volatility that deconvolutcs 
the return by the simulation of a random Wiener path [2l| . This last procedure also provides a Gaussian distribution 
for the log-volatility, albeit the distribution has too fat tails and pays the price of losing clustering and memory in 
the volatility time series. Our new procedure is in fact a more sophisticated variant of this estimate since it filters out 
Wiener realizations via maximum likelihood. The estimate drastically reduces the noise in the volatility path thus 
preserving data clustering. In this way we have thus proposed an alternative methods to those already provided by 
mathematical finance literature (see e.g. [H, H3, l3ll l32l. [H, [HI, [H, [H, H3, [H, [H, H(| EI)- ^ should however be 
left for a future research an accurate comparison between our method and others. We may even look for a way to 
implement parameter estimation in our method. 

The median of the estimated volatility has also been related to trading volume by the power-law expression M[a] oc 
t/0.55^ jjjjj between volatility and trading volume has been previously mentioned in different studies however our 
estimate is again capable to provide a less noisy regression. We must, indeed, stress the fact that this does not imply 
that volatility is proportional to a power of the volume, but only that its typical value is and that fluctuations around 
the average might play an important role. 

We have also seen that current returns are proportional to the estimated volatility, as otherwise expected. How- 
ever, the main novelty is that we have observed how future returns are proportional to current volatility and their 
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predictability diminishes monotonically with the number of time steps ahead. This last finding implies that our 
estimation method can be applied to predict the size of future returns with the knowledge of current volatility. 

As a final remark we stress the fact that the technique herein presented can be applied to a variety of physical 
phenomena besides finance. One typical problem of this sort is provided by the Brownian motion inside a field of 
force in which inertial effects are not negligible (49[. In this situation the dynamics of the particle is described by 
a two dimensional diffusion process (X(t), V(t)) representing the position and the velocity of the Brownian particle. 
The maximum likelihood technique might provide a reliable estimate of the velocity in the case that, for instance, the 
only accessible experimental measures are the positions of the particle at wide time steps, so that a measure of the 
velocity - which implies the knowledge of two very close positions - is too noisy and unreliable. 
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Appendix A: DERIVATION OF THE LIKELIHOOD FUNCTION 

In order to make notations more compact, in this Appendix the time dependence of the stochastic processes is 
mostly indicated as a lower index. A generic stochastic volatility model is defined as 

dX t = f{Y t )dW 1 (t), (Al) 
dY t = -g(Y t )dt + h(Y t )dW 2 (t). (A2) 

To explain the procedure it is more convenient to work with the discrete time version of the model. To this end, 
suppose that At is a small time step and that the driving noises in Eqs. (|Alj) - (|A2|) can be approximated by (c/Eq. 



dWi(t)ttSi(t)VAt, (t = l,2), (A3) 

where £i(t) are independent standard Gaussian processes with zero mean and unit variance. We remark that the 
approximation (|A3p has to be understood in mean square sense (45| . The discrete time equations of the model thus 
read 

X t - X t _ At = /(y t _ At )£i(i - At)VAi (A4) 

Y t - Y t _ At = -g(Y t „ At )At + h(Y t _ At )e 2 (t - At)^At (A5) 

from which we get 

e l {t-At)= (A6) 

/(y t _ At )VAF 

e 2 (t-At) = Y *- Yt -* t+9 ^ At . (A7) 

For a given number of realizations, the probability of the set {X T , Y T } (r = t — At, t — 2At, . . .,t — s) can be easily 
obtained, as we will see next. 

Let us denote the set of realizations as {X, Y}. Then the Markov property of the process ensures that one can 
decompose the joint probability density function (pdf ) of this set as a chain of products between conditional probability 
densities. In consequence, the pdf of the whole sample path can be written as 



t-At 

p({x,y}) = p(Av s ,y t _ s ) Yl F(x T ,Y T \x T _ At ,Y T _ At ), 

r—t — s 



(A8) 
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where the first term, ¥(X t - s , Y t - S ), corresponds to the initial realizations of X and Y s/At time steps far from the 
present time t; all the remaining terms of the form f(X T , Y T \X T _Ati Y T _At) are the conditional pdf's for transitions 
between consecutive steps: 

(X T -At, Y T -At) — ► (X T ,Y T ). 



The logarithm of Eq. (|A8J) is 



t-At 



lnP({X,Y}) = lnP(AV s ,y t _ s ) + \nP(X r ,Y T \X T _At,Y T _ At ). 

r—t — s 

On the other hand from Eqs. (|A4|) - (|A5|) we realize that 

¥{X Tl Y r \X T _At,Y T _At) = \J\P(sx(T-At),s 2 (T-At)), 

where \ J\ is the Jacobian of the transformation (X T , Y T ) — > — At), e 2 {t — At)) defined by Eqs. 

is, 

1 

\J\ = 



(A9) 



3, that 



f(Y T -At)h(Y T -At)At 
But £i and £2 are independent standard Gaussians, hence 

P(e 1 ,ea) = (l/27r)ffltp[(e?+e2)/2] ) 

whence 



F(X T ,Y T \X T _ A t,Y T -At) 



l/(27rAt) 



f{Y T -At)h(Xr-At) 



exp 



£?(t- At)+sl(r- At) 



(A10) 



Substituting Eqs. (|A6[) - (|A7p into this equation and the result into Eq. (|A9[) we finally get 



i-Ai 



lnP({X,Y}) = _ sln (2^At) _ ^ [ m/( y r _ At)+m/l( y T _ At)] 



t-At 



inP(x t _ s ,y t _ s )-i £ 



X T — Xf- 



Al 



i-Ai 



Y T -Y T _At 9(Y T -At) 

h(Y T -At)At h(Y T —At) 



f(Y T _ At )At 
-1 2 



At 



At. 



(All) 



Let us briefly explain the origin of some of these contributions. The first summand comes from the normalization 
constant of the Gaussian distribution (|A10|) . It appears in every conditional probability density and this is the reason 
for the factor s/At, which is the number of time steps between t — s and t. The resulting term does not depend on 
the realization, so that we can neglect it for a maximization with respect to Y. The term also goes to — 00 in the 
At — > limit, which means that any individual realization has a probability measure zero. 

The second summand is mostly the sum of the Jacobian transformations of each transition probability and depends 
on Y . Stochastic volatility models typically assume that these / and g [cf. Eqs. (|A1|) and (|A2|) ] are continuous and 
monotonically increasing functions or even constants. For instance, in the expOU model we have /(x) = mexp(x) 
and g(x) = k. Because of this, we will also neglect this term in the maximization procedure. The contribution shifts 
the maximum at the excessive cost of adding more noise to the numerical computations. We can however look at 
the situation from another point of view. Ignoring this term is equivalent to omitting the Jacobian transformation 
between the two probability density measures [cf. Eq. (|A10[) ] . In this way, we are stating that what we are really going 
to maximize is the probability of the realization of £\{t) and £2(t) -instead of Y(t)- in terms of the past history of 
the process expressed by {X, Y}. 

The term lnP(X t _ s , Y t - S ) is fixed by the initial conditions of the process. If we assume a known initial return X - 
which can be set to zero - and take a random Y t - S following the stationary distribution P st (Yt_ s ) given by Eq. ([5]), 
then P(X t _ s ,F t _ s ) = 5(X t _ s - X) P st (Y t _ s ) and hence 



lnP(AV s ,y t _ s ) = lnP st (r 4 _ s ) +ln<5(AVs - X). 



(A12) 
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Figure 8: Estimated log-volatility Y as a function of the actual log-volatility Y taken from 29038 simulations of the expOU 
model in four ways. Reconstruction uses the last s = 10 values of returns, and I = 10 5 iterations. The error bars represent the 
25% and 75% quantiles of the distribution estimated volatility. 

Had we taken another initial condition, the technique would have given equivalent results (we have checked this by 
using several initial distributions). For this reason and in order to improve the convergence of the maximum likelihood 
estimate we have neglected also this contribution. 
We therefore write 

1 t— Ai 

lnP({X,Y}) ~ -- £ 

r=t- 

1 t— At 

- - y 

2 ^ 

T—t — S 

We can represent this equation in the continuous time framework if At is sufficiently small and if /(a;), h(x) and g(x) 
are continuous. In such a case, Eq. (|A13p yields the result given in Eq. (fTTj) . 



Appendix B: ROBUSTNESS OF THE PROCEDURE 

In order to show that our volatility estimation procedure is robust, we carry out four measurements along the lines 
of Sec. IIII Cl We generate artificial time series of the same length as the DJIA dataset, with the same parameters as 
therein: m = (7.5 ± 0.5) x 1CT 3 days" 1/2 , a = (1.82 ±0.03) x 10~ 3 days -1 , and k = (4.7 ±0.3) x 10~ 2 days" 1/2 . 

Then we reproduce Fig. [T] in several variations: 

(a) In a way identical to the paper as shown in Fig. [TJ 

(b) Data generated as in (a), but for reconstructing Y we use the above parameters minus twice the error specified 
above. 

(c) Data generated as in (a), but for reconstructing Y we use the above parameters plus twice the error specified 
above. 

(d) In a way identical to (a), but the data is detrended by the Monte Carlo sample mean /i. We however remind that 
the process is driftless as given by Eq. ([3]). In this way we want to check whether detrending causes a systematic 
bias in the estimation. 




Y 



X T — A r _At 
f(Y T ^ At )At 

Y T - At , 



At 



h(Y T -At)At h{Y T -At). 



At 



(A13) 
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Figure 9: Estimated log-volatility Y as a function of the actual log-volatility Y taken from 29038 simulations of the expOU 
model in three different levels of "detrending error". Reconstruction used the last s = 10 values of returns, and / = 10 J 
iterations. The error bars represent the 25% and 75% quantiles of the distribution estimated volatility. Note that there only 
exists a clear deviation when the error is 100 times the true bias. 

The results are given in Fig. [8J The plot shows, that either option gives the same width of the Y distribution, and 
the bias introduced by either specification error is small. We have also run similar simulations but with very different 
parameters values and the procedure still provides consistent results similar to those given by Fig. [8j 

Finally, a more detailed test along the lines of point (d) above can be performed (that is: a test on wether the 
detrending causes a bias in our estimation procedure). We thus have repeated the test (d) also including errors 10^t 
and 100/i. The procedure tolerates up to 10 times more detrending error than expected in the real dataset. Only, 
when we have about 100 times the error, the reconstruction shows a strong upward bias for low volatility periods as 
shown in Fig. O All these results imply that possible errors in the detrending procedure could affect our procedure 
only when any wrong specification of the drift is far outside the error domain involved in our DJIA data set. 



Appendix C: DERIVATION OF EQ. (20) 

We start from Eq. IJ7J which we write in the approximate form 

AX(t) ~ me Y ^AW{t), 
thus In \AX(t)\ = m + Y(t) + In |AVFi(t)| and, taking into account that (Y(t)) = 0, we have 

(ln|AX(i)|) ~lnm+(ln|AVFi(t)|). (CI) 
On the other hand, we know that AWi(i) ~ e^/At, where e is a standard Gaussian variable (c/Eq. (fH))) ). Hence 

(ln|AVFi(t)|) « (ln|e|) + (lnAi)/2. 

But 

1 



(ln| £ |) = 



~ e / 2 ln|e|cie, 



which, after a simple change of variables inside the integral, yields [50|, [51( 



16 



(In lei) = —= / X - 1/2 e- x/2 \nxdx = -y/^/2h + In 2), 
2\/2Wo 

where 7 = 0.5772 • • • is the Euler constant. Therefore, 

(ln|AWi(i)|) « (lnAi)/2- ( 7 + ln2)/2. (C2) 
Substituting Eq. jC2]) into Eq. JUT]) proves Eq. (120)) . 
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